Representing Pattern Matching Algorithms by Polynomial-Size Automata

نویسندگان

  • Tobias Marschall
  • Noemi E. Passing
چکیده

Pattern matching algorithms to find exact occurrences of a pattern S ∈ Σ in a text T ∈ Σ have been analyzed extensively with respect to asymptotic best, worst, and average case runtime. For more detailed analyses, the number of text character accesses X n performed by an algorithm A when searching a random text of length n for a fixed pattern S has been considered. Constructing a state space and corresponding transition rules (e.g. in a Markov chain) that reflect the behavior of a pattern matching algorithm is a key step in existing analyses of X n in both the asymptotic (n → ∞) and the non-asymptotic regime. The size of this state space is hence a crucial parameter for such analyses. In this paper, we introduce a general methodology to construct corresponding state spaces and demonstrate that it applies to a wide range of algorithms, including Boyer-Moore (BM), Boyer-Moore-Horspool (BMH), Backward Oracle Matching (BOM), and Backward (Non-Deterministic) DAWG Matching (B(N)DM). In all cases except BOM, our method leads to state spaces of size O(m) for pattern length m, a result that has previously only been obtained for BMH. In all other cases, only state spaces with size exponential in m had been reported. Our results immediately imply an algorithm to compute the distribution of X n for fixed S, fixed n, and A ∈ {BM,BMH,B(N)DM} in polynomial time for a very general class of random text models.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

About the Size of Boyer-moore Automata

We study the size of Boyer-Moore automata introduced in Knuth, Morris & Pratt's famous paper on pattern matching. We experimentally exhibit a nite class of binary patterns, which produce large Boyer-Moore automata. The best approximation curve for their sizes is a polynomial O(m 7), or even an exponential O(2 0:4m), in the length m of the patterns. All the previously known maximal sizes were at...

متن کامل

Exact Analysis of Pattern Matching Algorithms with Probabilistic Arithmetic Automata

We propose a framework for the exact probabilistic analysis of window-based pattern matching algorithms, such as Boyer-Moore, Horspool, Backward DAWG Matching, Backward Oracle Matching, and more. In particular, we show how to efficiently obtain the distribution of such an algorithm’s running time cost for any given pattern in a random text model, which can be quite general, from simple uniform ...

متن کامل

An Algorithm to Compute the Character Access Count Distribution for Pattern Matching Algorithms

We propose a framework for the exact probabilistic analysis of window-based pattern matching algorithms, such as Boyer–Moore, Horspool, Backward DAWG Matching, Backward Oracle Matching, and more. In particular, we develop an algorithm that efficiently computes the distribution of a pattern matching algorithm’s running time cost (such as the number of text character accesses) for any given patte...

متن کامل

The Compression of Subsegments ofImages

We investigate how the size of the compressed version of a 2-dimensional image changes when we cut o a part of it, e.g. extracting a photo of one person from a photo of a group of people. 2-dimensional compression is considered in terms of nite automata. Let n be the size of the smallest acyclic automaton which describes an image T . We show that the tight bound for the compression size of a su...

متن کامل

On the Synthesis of Strategies in Infinite Games

Completeness and Weak Completeness Under Polynomial-Size Circuits p. 26 Communication Complexity of Key Agreement on Small Ranges p. 38 Pseudorandom Generators and the Frequency of Simplicity p. 50 Classes of Bounded Counting Type and their Inclusion Relations p. 60 Lower Bounds for Depth-Three Circuits With Equals and Mod-Gates p. 71 On Realizing Iterated Multiplication by Small Depth Threshol...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1607.00138  شماره 

صفحات  -

تاریخ انتشار 2016